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Abstract 


In  [2],  Chvatal  provided  the  tight  worst  case  bound  of  the  set 
covering  greedy  heuristic.  We  considered  a  general  class  of  set  covering 
heuristics.  Their  worst  case  bounds  are  dominated  by  that  of  the  greedy 
heuristic. 


1.  Introduction 

Nv'^^The  Set  Covering  problem  is  notoriously  hard  to  solve  and  is, 
in  fact,  NP-complete.s''{^]^  A  good  heuristic  algorithm  that  gives  a  close 
approximation  to  the  optimum  is  therefore  desirable.  Ntn-^fiJ^Chvatal 
found  the  tight  worst  case  bound  of  the  greedy  heuristic  commonly  con¬ 
sidered  in  the  literature.  In  this  paper,  we  investigate  the  worst  case 
behavior  of  a  general  class  of  heuristic  algorithms.  These  worst  case 
bounds  are  found  to  be  dominated  by  that  of  the  greedy  heuristic. , 

We  consider  the  Set  Covering  problem 


(1) 


Min  {cx|Ax  >_  e,  x  binary} 


T 

where  A  —  <aij>  is  m  x  n  with  a^  -  0,  1  for  all  i,  j;  e  *  (1,...,1)  is 
m  x  1;  x  is  n  x  1  and  ceRn  is  1  x  n.  For  notation  purposes,  we  define 
M  =  {l,...,m}  as  the  set  of  row  indices, 

N  =  {l,...,n}  as  t he  set  of  column  indices, 

Mj  «  {ieMla^  ■  1}  for  every  jeN 
and  »  {jeN|a^  »  1}  for  every  ieM. 

Any  feasible  solution  is  said  to  be  a  cover.  Any  nonredundant  cover  is 
said  to  be  prime.  If  Xj  *  1  in  a  feasible  solution  to  (1) ,  variable  j 
is  said  to  cover  all  rows  ieM^ .  Without  loss  of  generality ,  we  assume 


Cj  >  o 

all 

jeN 

/—V 

CM 

v-/ 

Mj  0 

all 

jeN 

N±  0 

all 

ieM, 

The  worst  case  performance  is  measured  by  the  smallest  bound  Q 
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where  Z^eu  and  ZQpt  are  the  values  of  the  heuristic  and  optimal  solutions. 

Due  to  our  assumptions  in  (2),  there  exists  at  least  one  feasible  solution 

and  Z  >  0  holds.  The  ratio  Z.  / Z  „  is  well-defined, 
opt  heu  opt 

2.  Algorithm  I 

The  class  of  heuristic  algorithms  that  we  consider  is  a  generali¬ 
zation  of  the  greedy  heuristic.  In  essence,  the  heuristic  sets  a  value  of 
one  variable  at  a  time  until  a  cover  is  found.  Each  variable  is  evaluated 
according  to  its  cost  and  the  number  of  rows  that  it  may  cover.  We  let 
Rr  be  the  set  of  uncovered  rows  before  the  rth  variable  is  chosen  by  the 
heuristic,  S(x)  be  the  support  of  the  cover  to  be  found  and  k^  be  the 
number  of  additional  rows  variable  j  can  cover.  We  call  this  class  of 
heuristics  Algorithm  I. 

Step  0  Let  »  M,  S(x)  ■  0  and  r  ■  1.  Go  to  1. 

Step  1  If  »  0,  go  to  2. 

Otherwise,  define  k^  -  |Mjf|Rrl  for  all  Jen.  Let  j  eN  be 
such  that 

f(cj*»  krj*>  “  “J"  f  (cykrj)* 

In  case  of  a  tie,  a  fixed  but  arbitary  tie  breaking  rule  is  used. 

Set 

S(x)  «-  S(x)  U  {j*} 

Rr+1  “■  RrxMj* 
r  f  r+1 


and  go  to  1. 
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Step  2  Let  *  1  jeS(x) 

-  0  otherwise 

and  stop. 

A  function  f  is  used  to  evaluate  the  variables.  A  different 
function  used  will  correspond  to  a  different  heuristic.  For  obvious 
reasons,  we  require 

f(c_.  ,  0)  A  +  ®. 

Otherwise,  we  consider  any  f  :  R+  )(  Z+  +  R  where  R+  is  the  set  of  positive 
real  numbers  representing  c^  and  Z+  is  the  set  of  positive  integers 
representing  kr. . 

The  greedy  heuristic  that  Chvatal  considered  in  [2]  is  a  special 
case  of  Algorithm  I  when  f(Cj,  krj)  *  Cj/krj.  The  tight  worst  case  bound 
that  Chvatal  derived  is 


P) 


<  H(d) 


“  1 

where  H(d)  -  l  j 
j-l  J 

and  d  •  Max | M , | . 

jeN  J 

This  bound  is  dependent  on  problem  size,  density  and  the  distribution  of 
the  nonzero  coefficients  in  the  matrix  A.  The  function  H(d)  is,  in  turn, 
bounded  by  log  d  for  reasonably  large  d.  Similar  results  for  special 
classes  of  problems  were  obtained  previously  by  Johnson  [4]  and  Lovasz  [6]. 
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Any  fixed  but  arbitary  tie  breaking  rule  may  be  used.  The  tie 
breaker  may  use  any  data  that  is  available,  including  and  k^ .  Without 
loss  of  generality,  we  assume  that  the  tie  breaking  rule  is  different  fyom 
the  function  f  used  so  that  if  there  exist  j^,  e  N  with  j  ^  #  j^, 

llrj1  ^  krj  but  f(c  ,  k  )  =  f(c.  ,  k  .  ),  the  tie  breaker  will 
Ji  Jl  2  2  r;1 2 

break  the  tie.  When  all  rules  fail,  we  allow  breaking  ties  arbitarily 

by  the  location  of  ones  so  that  a  variable  can  always  be  chosen.  A 

good  example  will  be  to  choose  if  <  j^. 

In  the  next  theorem,  we  show  that  the  worst  case  performance  of 

any  heuristic  in  Algorithm  I  is  dominated  by  that  of  the  greedy  heuristic. 

We  also  use  the  symbol  <_,  when  used  in 

f(c,  ,  k  )  £  f(c.  ,  k  ,  ), 

J 1  r  J  ^  J  2  r  J  g 

to  indicate  either 

f(V  krj1’)  *  £<cj2'  krj2> 

or 


Theorem  1 


f(c.  ,  k  .  )  but  the  tie  breaker  chooses  j.. 
2  2  TJ2  1 


Assume  Algorithm  I  is  used.  There  is  no  function  f  that  gives  a 
worst  case  bound  strictly  better  than  H(d)  for  any  d  >_  1. 

Proof 


By  contradiction.  Notice  that  the  theorem  is  trivial  when 
d*l  as  >_  ZopC  implies  Zkeu/Z0pt  >  H(l) .  We  assume  f  is  a  function, 

when  used  in  Algorithm  I,  that  gives 


^heu 

Zopt 


<.  Qd  <  H(d) 


(4) 


for  some  d  >  2. 


We  consider  two  cases 
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Case  2  (5)  does  not  hold  for  some  d  >_  2. 

Without  loss  of  generality,  let  d  be  the  smallest  integer 
so  that  (5)  does  not  hold.  We  prove  by  induction  on  d. 

Subcase  2 . 1  d  =■  2 

The  negation  of  (5)  gives 

(7)  f(a,  1)  _<  f(2a,  2)  some  a  >  0. 

(8)  Let  c  =  a  if  f(a,  1)  <_  f(2a,  1) 

=  2a  otherwise 

and  consider 

Min  cx^  +  2a  +  2a 
s.t. 

Xj  +  x3  >  1  3  *  1,2 

Xj  -  0,1  j  -  1,2,3 

From  (7)  and  (8),  the  heuristic  chooses  x^  first.  Then,  regardless  of 
which  variable  the  heuristic  chooses  next,  we  have  =  2a  +  c.  The 

optimal  solution  is  x^  =  X£  ■  0,  x^  =  1  with 

opt 

which  contradicts  with  (4) . 

Subcase  2.2  d  ^  3. 

Since  (5)  holds  for  d-1  but  not  for  d,  we  have 
dd 

(9)  f(— ,  ^  —  ^a>P)  some  a  >  0,  some  p  e  {l,...,d-l} 


and 
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(10)  f(C^  i:>,  d-1)  <  f  (c,  j)  all  c  >  0,  i=l ....  ,d-2 

Claim  f(|^,  d)  >_  ,  d-1).  If  p  =  d-1  in  (9), 

it  is  trivial.  If  p  <  d-1,  (9)  and  using  j  =  p,  c  =  a  in  (10)  give 
fCf*.  d)  >  f (a,  P)  >  f(*iji=lL  d-1). 

3d 

Let  c  =  —  and  (10)  can  be  generalized  to 

(11)  f(-~j— ",  d-1)  <  f (c,  j)  j=l . d-2,  d 

Let  kj  £  {l,...,j}  for  all  j=l,...,d  be  such  that 

(12)  f(^=ii  frc(d— 1)  ... 

>  u—l)  <_  ^  ,  d— 1)  k*l>***,j 


Consider  the  problem 
d 


where 


d-1 


Min  Z  x.  +  Z  c  y. 


j=l  j  J  i=l 


s.  t. 


xj  y±  1  1  i-l>  •  •  •  »d— 1,  j-l,...,d 

except  for  i=l,  j=*d 

Xxd-1  +  xd  +  >rl^1 

Xj.  yi  =  0  or  1  i=l, . . . ,d-l,  j«l,...,d 


X  =  1 
-  0 


if  kd-l  =  d_1 
otherwise 


2.2.1  X  *  0  and  k.  .  <  d-1 

d— 1 


From  (11)  and  (12),  x d  is  chosen  first.  From  (12),  xd  ^  is 
then  chosen  over  x,...,  x 2_2 •  Since  all  y^s  have  d-1  ones  left  and 


kd_i  <  d-1, 


f(£ldzil  d_i)  <  f(-c-i-d.^>.t  d_i)  -  f(c,  d-1) 
Kd-1  a  1 
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and  x(j_1  is  chosen  over  y^  . . .  .y^^.  From  (11)  and  (12),  the  heuristic 
will  then  choose  xd_2’*‘",xl  to  complete  the  solution. 

2.2.2 

From  (12),  x^  is  chosen  over  x^,...,xd_2  first.  As  xd  has 

cost  c,  xd_^  is  identical  to  all  y./s.  From  (11)  and  (12),  xd  is  also 

chosen  over  xd_^,  ylf..,yd_^.  From  (12),  xd_1  is  chosen  over  x1,...,xd_2 

Since  x(j_^  is  identical  to  all  y^'s  except  for  the  location  of  ones, 

all  tie  breaking  rules  will  fail  but  we  can  always  rearrange  the  matrix 

so  that  xd_^  is  chosen  arbitrarily.  From  (11)  and  (12),  the  heuristic 

will  choose  xd_2,***>xi  sequentially. 

In  either  case,  the  heuristic  solution  is  x.  =  1,  y.  =0. 

J  l 

j  =  l,...,d,  i  =  1,...,’-1.  The  optimal  solution  is  x^  =  0,  y^  =  1 
for  all  i  =  1, . . . ,d-l,  j  -  l,...,d  with 


d 

Z  2  c(d-l) 

heu  =  j=l 

Zopt  c(d-l) 
which  contradicts  with  (4) . 


d 

>  Z 
j=l 


1 

j 


Q. E. D . 


as  k.  <  j 
J  - 


The  cover  obtained  from  Algorithm  I  is  not  necessarily  prime. 

It  is  possible  to  implement  a  simple  procedure  to  derive  a  prime  cover 
from  the  heuristic  solution.  See,  for  example,  [1],  [3].  The  value  of  the 
prime  cover  and  consequently  the  worst  case  behavior  may  improve.  The  next 
theorem  shows  that,  with  some  general  assumption  on  the  tie  breaking  rule, 
the  result  of  Theorem  1  still  holds. 
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We  assume  that,  if  a  tie  exists,  the  tie  breaking  rule  is 

based  on  and  only  so  that  the  tie  breaker  will  work  whenever  we 

have  two  variables  with  c.  £  c .  .  In  case  of  a  second  tie, 

J1  J2 

we  allow  breaking  ties  utilizing  additional  data  that  is  available, 
including  breaking  ties  arbitarily. 

Theorem  2 


Assume  a  procedure  is  used  to  strengthen  a  solution  from 
Algorithm  I  to  prime.  Assume  further  that  the  tie  breaker  is  as 
previously  described.  There  is  no  function  f  that  gives  a  worst  case 
bound  strictly  better  than  H(d) . 

Proof 


It  suffices  to  show  that  all  heuristic  solutions  are  prime. 

Notice  first  that,  with  the  changes  in  the  tie  breaker,  Algorithm  I 
will  give  the  same  solutions  for  all  counter  examples  in  the  proof  of 
Theorem  1.  Since  the  heuristic  solutions  are  prime  except  for  subcase  2.1, 
it  suffices  to  consider  subcase  2.1  only.  We  have  d=2  and 


(13)  f(a,l)  _<  f(2a,  2)  for  some  a  >  0. 

Let  A  =  Min  (a, 2a  (H(2)  -  Q2)  )  and  consider 
3 

Min  I  c.x. 

J-l  J  J 

s .  t. 


xj  +  x3  5  1 

3=1 

Xj  =  0,  1 

j=l 

=  1  with 


1 


-u- 


heu 

Z  _ 
opt 


3a 

2a+0 


3a 


0  ,  A 

2*+qJ 


H(2) 


H(2) -Q. 


1+ 


=  Q, 


which  contradicts  with  (4) . 


Subcase  2.2  f(2a+9,  2)  £  f(a,l) 

This  is  case  1  in  the  proof  of  theorem  1. 

Consider 


Min  (2a+0)  x^  +  ax^  +  ay^  +  ay  2 
s.  t. 


Xj  +  y±  >  1  i-1,2,  j-1,2 

xj»  y±m  o*1 

The  heuristic  chooses  X2  arbitrarily  and  then  x^  for  the  prime  cover 
=  x2  =  1,  y^  =  y2  =  0.  The  optimal  solution  is  y^  *  y2  =  1, 
x^  =  x2  =  0  with 


heu  _  3a+9 

Z  _  “  2a 
opt 


>  H(2)  - 


A_ 

2a 


which  contradicts  with  (4) . 


Q.E.D. 


3.  Extensions  of  Algorithm  I:  Algorithm  II. 

The  worst  case  bound  for  Algorithm  I  is  dependent  on  problem 


size.  We  are  then  interested  to  find  other  heuristic  algorithms  that  may 
give  a  better  worst  case  performance.  Algorithm  II  is  an  extension  of 
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Algorithm  I  in  that  it  chooses  one  variable  at  a  time.  The  difference  is 
in  step  one  where  the  variable  is  chosen  only  from  a  subset  of  all  variables 
that  are  available.  More  specifically,  a  row  is  chosen  first  and  the 
heuristic  then  chooses  a  variable  with  a  nonzero  coefficient  in  that  row. 
Algorithm  II  is  computationally  more  efficient  and  since  all  rows  must 
be  covered  eventually.  Algorithm  II  chooses  one  variable  to  cover  the 
row  that  is  considered  most  essential  first. 

Step  0  Let  R^  =  M,  S(x)  =0,  r  =  1  and  go  to  1. 

Step  1  If  R  «  0,  go  to  2. 

Otherwise,  define  =  (M^  fl  Rr|. 

(a)  Pick  ireRr- 

it 

(b)  Pick  so  that 

r 

f(c1*’  kr1*}  *  Mn  f(ci’ 

Jr  Jr  jeN^  J 

Set  S(x)  ^  S(x)  U  {j*} 

Rr+1  "  VMJ* 

Jr 

r  r+1 
and  go  to  1. 

Step  2  Let  xj  ”  1  JeS(x) 

*  0  otherwise 

Different  rules  can  be  used  to  pick  the  row  if  in  step  la. 

A  different  rule  will  correspond  to  a  different  class  of  heuristic 


algorithms . 


Theorem  3 


Q 

Assume  f(Cj,krj)  <*  r/k^j .  Regardless  of  the  rule  used  in  step 


la  in  Algorithm  II,  the  worst  case  bound  is 


<  dH(d) 

opt 
d  . 


where  H(d)  =  I  -v  and  d  =  Max|M. | , 
j=l  -  jeN  J 


Proof 


Let  x  and  x  be  the  optimal  and  heuristic  solutions  respectively 


and  let  S(x)  =(jeN|xj  =  1}.  Since  x  must  be  feasible,  it  covers  the 
rows  ir>  r  =  l,...,k  where  k  =  |S(x)|.  Then,  there  exists  at  least  or 
j(r)eS(x  )  so  that  j(r)eN^  for  every  r.  We  have 


"J*  °j(r) 

^3*  ~  krj  (r) 


r*l, . . . ,k 


_  k  k  rj* 

cx  =  I  c . .  <  Z  c,,  \  - - 

r=l  Jr  r=l  krj(r) 


jeS(x 


/  rj*  \ 

’>  °i(rf 8,157) 


where  *  {  r|  j  (r)  -  j}. 

kr<*  1  K*l  <  d  Zhp„  1  d  ci  (r h 


k  .  <  M..  <  d  implies  Z,  <  d  E  c,  /  Ec  1 

heu_  jeS(A*)  j  (rcSj  —  )• 

Claim  that  z  k  “  Z  —  for  every  jeS(x*).  For  jeS(x*)  such  that 
reSj  rj  J-l  J 
d  1 


1  JL<  z  j 


S . j  <1,  reS.  k  ,  j»l  is  trivial.  Suffice  to  consider  jeS(x  )  such 
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that  |  Sj  ]  >_2.  Let  r^,  r^S^  with  r^r2.  w^-t'10ut  loss  of  generality, 

r1  <  r_.  From  the  definitions  of  S.  and  j(r),  i  ,i  eM  .  Since 

^  ‘  J  rl  r2  1 

\  *  S’  '  |mj  n  V  -|Ki n  S'  ' 1 '  \i  ' x- 

general,  k-r  ^  +  kf  j  for  all  r^  r^Sj  and  r1  r2. 


In 


As 


krje  {1,...,|M^|}, 


1  |Mi>  1  d  1 

l  1T~  —  1  Ti  z  T“  H<d>- 
reS  rj  j-1  3  j-1  J 


Substitution  yields 


Zheu  —  d  jeS(x*)  Cj  ^ris^  k^ 
i  dHW)  jls(x*)  cj 


dH(d)  Z 


opt 


Q.E.D. 


Different  rules  can  be  used  in  Step  la.  Two  specific  rules  are 
considered  here.  In  the  first  rule,  we  pick  a  row  of  minimum  cardinality 
so  that  a  row  with  fewer  potential  candidates  to  be  chosen  from  is  covered 
first.  We  call  it  Algorithm  II. 1.  In  the  second  rule,  a  penalty  for 
choosing  a  wrong  variable  is  computed.  The  row  with  the  largest  penalty 
is  chosen  first.  The  penalty  for  every  row  is  the  difference  between  the 
two  smallest  functional  values.  We  call  this  Algorithm  II. 2.  The  details 
are  outlined  as  follows. 

Algorithm  II. 1 

|N  |  -  Min  | N. | 

r  ieR 

r 


Step  la 
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Algorithm  II.  1 


Step  la 


(i)  For  ieRr  such  that  |N^|  *  1,  define  P(i)  *  +  «. 

(ii)  For  ieRj.  such  that  |N^|  2,  define  j*. 


such  that 


:(c  k  )  <  f(c  k  .)  <  f(c.,k  .)  jeN.\{j*jb 
rjj  j*’  rj^  j  rj  112 


and  P(i)  -  f(c  1>k  ±)  -  f(c  i>k  i> 


i2  r^2 


jl  rjl 


Pick  i  by 


P(i  )  -  Max  P(i) 
r  ieR 

r 


Remark  4 


If  f(Cj,krj)  •  'k^j  is  used  in  either  Algorithm  II. 1 


or  II. 2,  the  bound  dH(d)  is  tight. 


Proof  Let  ^ <  dH(d) .  Consider,  for  any  d  ^  2, 
opt 


c>e>26>0,  6<  c(dH(d)-Q  .) , 


£  cd 


in  Z  t2-  x,  +  (c +6)x...  +  Z  (c*e)y,  +  E  &z 

.t.  J-l  j  j  d+1  3-1  3  J’1  J 


xj  +  xd+i  +  tL  *i  i  1 


d+2 

x  +  I  z.  >  1 
J  i-i  i  ~ 

XJ*  yJ’  ZJ  "  1 


j*l . d 


The  rows  and  columns  chosen  are  d,  d-l,...,l  and  x^,  xd-l’*’*,Xl 
in  that  order.  An  optimal  solution  is  Xj»0,  y^O  j«l,...  ,d,  xd+1-l, 

*l“l»  zj”°  3*2» • • • »d+2. 
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We  have  d 

Z  f 

fheu  »  j“1  .  cdH(d)  dH(d)  ,  0 

Z  _  c+25  c+25  dH(d)-Q.  wd 

*♦  IT"1 

for  the  contradiction. 

Theorem  5 

Assume  either  Algorithm  II. 1  or  II. 2  is  used.  There  is  no 
function  f  that  gives  a  worst  case  bound  strictly  better  than  H(d),  for 
any  d  ^  1. 

Proof 

The  proof  is  the  same  as  that  of  Theorem  1.  The  heuristic  will 
choose  a  sequence  of  rows  such  that  the  same  variables  and,  hence,  the 
same  solution  in  all  counter  examples  are  chosen.  Notice  that  | N,^ j 
is  the  same  for  all  rows  and  if  Algorithm  II. 1  is  used,  the  rows  can  be 
chosen  arbitrarily  for  the  desired  result. 

Q.E.D. 


3.  Conclusion 

Two  general  classes  of  heuristic  algorithms  were  considered. 

They  are  easy  to  implement  as  the  variables  are  evaluated  essentially 
on  the  cost  and  the  number  of  rows  that  can  be  covered.  The  worst  case 
performances  of  all  heuristics  are  dominated  by  the  function  H(d)  which, 
in  turn,  is  bound  by  log  d  and  is  dependent  on  problem  size  and  distri¬ 
bution.  A  different  approach  would  be  needed  in  order  to  find  a  heuristic 
that  gives  a  better  worst  case  bound.  From  the  proofs,  the  worst  case 
bounds  for  different  functions  and,  hence,  heuristics  are  attained  in 
different  examples.  A  combination  of  some  functions  may  improve  the 
average  performance.  A  computational  study  is  available  in  [1]. 
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